The SSM Toolbox for Matlab 



Jyh-Ying Peng^'^, John A. D. Aston^ 



o 
o 

<N 
I— ^ 

m 

(N 

o 
u 

t/3 



> 

m 

O 

o 



X 



^Institute of Statistical Science, Academia Sinica* 
^ Computer Science and Information Engineering, National Taiwan University 



February 3, 2008 



* Address for Correspondence: 
Institute of Statistical Science, 
Academia Sinica 
128 Academia Road, Sec 2 

Taipei 115, Taiwan 

email: {jypeng,jaston} @stat.sinica.edu.tw 



2 



Peng & Aston 



Contents 



1 Introduction [5] 

2 The State Space Models g] 

2.1 Linear Gaussian models |5] 

2.2 Non-Gaussian models |6] 

2.3 Nonlinear models |6] 

3 The State Space Algorithms [6] 

3.1 Kalman filter E] 

3.2 State smoother [8] 

3.3 Disturbance smoother [8] 

3.4 Simulation smoother |9] 

4 Getting Started [10] 

4.1 Preliminaries [TO] 

4.2 Model constmction [TO] 

4.3 Model and state estimation [TO] 

5 SSM Classes [U] 

5.1 Class SSMAT [H 

5.2 Class SSDIST [U] 

5.3 Class SSFUNC [H] 

5.4 Class SSPARAM [12] 

5.5 Class SSMODEL [H] 

6 SSM Functions [H] 

7 Data Analysis Examples [15] 

7.1 Structural time series models [15] 

7.1.1 Univariate analysis [TO] 

7.1.2 Bivariate analysis [21] 

7.2 ARMA models [22] 

7.3 Cubic spline smoothing [24] 

7.4 Poisson distribution error model [25] 

7.5 t-distribution models [27] 

7.6 Hillmer-Tiao decomposition [27] 

A Predefined Model Reference ]29] 

B Class Reference [32 

B.l SSMAT [32] 

B.1.1 Subscripted reference and assignment [32] 

B.l. 2 Class methods [32] 

B.2 SSDIST [33] 

B.2.1 Subscripted reference and assignment [34] 



SSM Toolbox for Matlab 3 



B.2.2 Class methods [H 

B.2.3 Predefined non-Gaussian distributions [34l 

B.3 SSFUNC [35] 

B.3.1 Subscripted reference and assignment [35] 

B.3.2 Class methods [35] 

B.4 SSPARAM [36] 

B.4.1 Subscripted reference and assignment [36] 

B.4.2 Class methods [36] 

B. 5 SSMODEL M 

B.5.1 Subscripted reference and assignment [37] 

B.5.2 Class Methods [52] 

C Function Reference [38] 

C. 1 User-defined update functions [38] 

C.2 Data analysis functions [39] 

C.3 Stock element functions [42] 

C.4 Miscellaneous helper functions [45] 



4 



Peng & Aston 



Abstract 

State Space Models (SSM) is a MATLAB 7.0 software toolbox for doing time series analysis by state space meth- 
ods. The software features fully interactive construction and combination of models, with support for univariate and 
multivariate models, complex time-varying (dynamic) models, non-Gaussian models, and various standard models 
such as ARIMA and structural time-series models. The software includes standard functions for Kalman filtering 
and smoothing, simulation smoothing, likelihood evaluation, parameter estimation, signal extraction and forecasting, 
with incorporation of exact initialization for filters and smoothers, and support for missing observations and multiple 
time series input with common analysis structure. The software also includes implementations of TRAMO model 
selection and Hillmer-Tiao decomposition for ARIMA models. The software will provide a general toolbox for doing 
time series analysis on the MATLAB platform, allowing users to take advantage of its readily available graph plotting 
and general matrix computation capabiUties. 

Keywords: Time Series Analysis, State Space Models, Kalman Filter. 
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1 Introduction 

State Space Models (SSM) is a MATLAB toolbox for doing time series analysis using general state space models 
and the Kalman filter 1 1 1 . The goal of this software package is to provide users with an intuitive, convenient and 
efficient way to do general time series modeling within the state space framework. Specifically, it seeks to provide 
users with easy construction and combination of arbitrary models without having to explicitly define every component 
of the model, and to maximize transparency in their data analysis usage so no special consideration is needed for 
any individual model. This is achieved through the unification of all state space models and their extension to non- 
Gaussian and nonlinear special cases. The user creation of custom models is also implemented to be as general, 
flexible and efficient as possible. Thus, there are often multiple ways of defining a single model and choices as 
to the parametrization versus initialization and to how the model update functions are implemented. Stock model 
components are also provided to ease user extension to existing predefined models. Functions that implement standard 
algorithms such as the Kalman filter and state smoother, log likelihood calculation and parameter estimation will work 
seamlessly across all models, including any user defined custom models. 

These features are implemented through object-oriented programming primitives provided by MATLAB and 
classes in the toolbox are defined to conform to MATLAB conventions whenever possible. The result is a highly 
integrated toolbox with support for general state space models and standard state space algorithms, complemented by 
the built-in matrix computation and graphic plotting capabilities of MATLAB. 



2 The State Space Models 

This section presents a summary of the basic definition of models supported by SSM. Currently SSM implements the 
Kalman filter and related algorithms for model and state estimation, hence non-Gaussian or nonlinear models need 
to be approximated by linear Gaussian models prior to or during estimation. However the approximation is done 
automatically and seamlessly by the respective routines, even for user-defined non-Gaussian or nonlinear models. 
The following notation for various sequences will be used throughout the paper: 

Ut p X 1 Observation sequence 

£t p X 1 Observation disturbance (Unobserved) 

at TO X 1 State sequence (Unobserved) 

Tjt r X 1 State disturbance (Unobserved) 



2.1 Linear Gaussian models 

SSM supports linear Gaussian state space model in the form 

yt = Ztat + et, St^Af{0,Ht), 

at+i = ct + Ttat + Rtr]ti Vt ^ ^f{0,Qt), (1) 
ai ~ Af{ai,Pi), t = l,...,n. 

Thus the matrices Zt,Ct,Tt, Rt, Ht,Qt,ai, Pi are required to define a linear Gaussian state space model. The matrix 
Zt is the state to observation linear transformation, for univariate models it is a row vector. The matrix Cf is the same 
size as the state vector, and is the "constant" in the state update equation, although it can be dynamic or dependent 
on model parameters. The square matrix Tt defines the time evolution of states. The matrix Rt transforms general 
disturbance into state space, and exists to allow for more varieties of models. Ht and Qt are Gaussian variance matrices 
governing the disturbances, and ai and Pi are the initial conditions. The matrices and their dimensions are listed: 
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p X m 


State to observation transform matrix 


Ct 


m X 1 


State update constant 


Tt 


m X m 


State update transform matrix 


Rt 


m X r 


State disturbance transform matrix 


Ht 


p X p 


Observation disturbance variance 


Qt 


r X r 


State disturbance variance 


ai 


m X 1 


Initial state mean 


Pi 


m X m 


Initial state variance 



2.2 Non-Gaussian models 

SSM supports non-Gaussian state space models in the form 

yt ^ p{yt\dt), dt^Ztat, 

at+i = ct + Ttat + Rtrjt, Vt ~ Qt ^ pivt), (2) 
ai ~ JV{ai,Pi), t—l,...,n. 

The sequence 9t is the signal and Qt is a non-Gaussian distribution (e.g. heavy-tailed distribution). The non-Gaussian 
observation disturbance can take two forms: an exponential family distribution 

Ht = piytPt) = exp [yj9t - 6* (6*0 + Ct{yt)] , -oo < 0* < oo, (3) 

or a non-Gaussian additive noise 

yt^0t + £t, st^ Ht= p{et). (4) 

With model combination it is also possible for Ht and Qt to be a combination of Gaussian distributions (represented 
by variance matrices) and various non-Gaussian distributions. 



2.3 Nonlinear models 

SSM supports nonlinear state space models in the form 

yt - Zt{at) + et, et^JV{0,Ht), 
at+i = Ct + Tt{at) + RtVt, Vt - ^f{0,Qt), (5) 
ai ^ J\f{ai,Pi), t^l,...,n. 

Zt and Tt are functions that map m x 1 vectors to p x 1 and m x 1 vectors respectively. With model combination it 
is also possible for Zt and Tt to be a combination of linear functions (matrices) and nonlinear functions. 



3 The State Space Algorithms 

This section summarizes the core algorithms implemented in SSM, starting with the Kalman filter Normally the initial 
conditions ai and Pi should be given for these algorithms, but the exact diffuse initialization implemented permits 
elements of either to be unknown, and derived implicitly from the first few observations. The unknown elements in ai 
are set to 0, and the corresponding diagonal entry in Pi is set to oo, this represents an improper prior which reflects the 
lack of knowledge about that particular element a priori. See fT\ for details. To keep variables finite in the algorithms 
we define another initial condition Poo,i with elements 

(P ) I if (Pi),,, < oo, 



SSM Toolbox for Matlab 



7 



and set infinite elements of Pi to 0. AH the state space algorithms supports exact diffuse imtialization. 

For non-Gaussian and nonlinear models hnear Gaussian approximations are performed by linearization of the 
loglikelihood equation and model matrices respectively, and the Kalman filter algorithms can then be used on the 
resulting approximation models. 



3.1 Kalman filter 

Given the observation yt, the model and initial conditions ai,Pi,Poo.i^ Kalman filter outputs at = Fj{at\yi:t-i) 
and Pt — Yai{at\yi.t-i), the one step ahead state prediction and prediction variance. Theoretically the diffuse state 
variance Poo.t will only be non-zero for the first couple of time points (usually the number of diffuse elements, but may 
be longer depending on the structure of Zt), after which exact diffuse initialization is completed and normal Kalman 
filter is appUed. Here n denotes the length of observation data, and d denotes the number of time points where the 
state variance remains diffuse. 

The normal Kalman filter recursion (for t > d) sue as follows: 

Vt = yt-Ztttt, 

Ft = ZtPtZj + Ht, 

Kt = T,PiZjF-\ 

Lt - Tt-K^Z^, 

at+i = Ct + Ttat + Ktvt, 

Pt+i = TtPtLj + RtQtRj . 

The exact diffuse initialized Kalman filter recursion (for t <d)\s divided into two cases, when .Foo.t = ZtPoo,tZt' 
nonsingular: 

Vt = yt- Ztat, 

Ff) = -Fi'\ZtPtZj + Ht)F^'\ 

Kt = TtPoo,tZj Ft^\ 

= Tt{PtZfF^'HP^,tZjFi% 

Lt = Tt-KtZt, 

= -Ki^'Zt, 

flt+i Ci + Tiai + KiVt, 

Pt+i = TtPo.,tLi'^^ + TtPtLj + RtQtRj, 

Poo,t+i = TtP^^tLj , 



and when F^o t = 0: 



Vt = yt-Ztttt, 
Ft = ZtPtZj + Ht, 



Kt = TtP,ZfF^ 



Lt = Tt-KtZt, 
at+i = ct + TtUt + KtVt, 
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Pt+i — TtPfL^ + RtQtRf , 



3.2 State smoother 



Given the Kalman filter outputs, the state smoother outputs ctt = E{at\yi:n) and Vt = Var(Q!t|?/i;n), the smoothed 
state mean and variance given the whole observation data, by backwards recursion. 

To start the backwards recursion, first set r„ = and Nn = 0, where rt and Nt are the same size as at and Pt 
respectively. The normal backwards recursion (for t > d)is then 

n-i = Z^Fi\ + Llrt, 
Nt_, = Z^Ff'Zt + LjNtLt, 
at = at + Ptn-i, 
Vt = Pt-PtNt-iPt. 

For the exact diffuse initiaUzed portion {t < d) set additional diffuse variables r^^ = and N^^ = N^^ = 
corresponding to and respectively, the backwards recursion is 

rt-1 = Lfrt, 

iV« = Z^Fy^Zt + LlN^^'^Lt + Li'^^NtLt, 

Nt_, = LjNtLt, 

at = at + Ptn-i + Poo,trt^2i, 

Vt = Pt-PtNt-iPt-{Poo,tNZ\Ptf 

— Poo,tNt\Pt — -Poo,t-^t_i-foo,t) 



if -Poo,t is nonsingular, and 



_ T^T (1) 

rt-i = ZlFi\ + Llrt, 

Nil\ = T^N^Tt, 

Nt-i = Z^Ff^Zt + LjNtLt, 

at = at + Ptn-i + Poo,trt-i, 

Vt = Pt-PtNt_^Pt-{PooM-iPtf 

— Poo,tNt\Pt — Poo,tNt\Poo,ti 



ifF, 



oo,t 



3.3 Disturbance smoother 

Given the Kalman filter outputs, the disturbance smoother outputs e, = E(ei|yi.„), Var(et|yi.„), fit = ^{VtllJi-.n) 
and Var(ry(|2/i:„). Calculate the sequence rt and Nt by backwards recursion as before, the disturbance can then be 
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estimated by 



fit = QtRfn, 

Var(?7t|i/i:„) = Qt- QtRjNtRtQt, 

it = Ht{Ft-\-Kfrt), 

yai{et\yi:n) = Ht - Ht{Ff^ + NtKt)Ht 



ift>dor Foo,t = Oj and 



m = QtRfrt, 
Var(?7t|2/i:„) = Qt - QtRj NtRtQt, 

it = -HtKjn, 

Vax(£,|t/i.„) = Ht-HtKjNtKtHt, 

at <d and F^^t is nonsingular. 

3.4 Simulation smoother 

The simulation smoother randomly generates an observation sequence yt and the underlying state and disturbance 
sequences at, h and fji conditional on the model and observation data yt. The algorithm is as follows (each step is 
performed for all time points t): 

1. Draw random samples of the state and observation disturbances: 

4 ~ ^(0,Qt). 

2. Draw random samples of the initial state: 

a+~7\A(ai,Pi). 

3. Generate state sequences and observation sequences t/j" from the random samples. 

4. Use disturbance smoothing to obtain 

e* = E(et|yi:„), 
fit = E(77(|?yi:„), 
= E(e+ 1 ?;+„), 
i)t = E(,7+|t/+J. 



5. Use state smoothing to obtain 



6. Calculate 



at = E(at|j/i:„), 
at = E(a+|y+J. 



Oit = at-af + af, 
it = — + 
Vt = fit- fit + vt- 



7. Generate yt from the output. 
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4 Getting Started 

The easiest and most frequent way to start using SSM is by constructing predefined models, as opposed to creating a 
model from scratch. This section presents some examples of simple time series analysis using predefined models, the 
complete list of available predefined models can be found in appendix [A| 

4.1 Preliminaries 

Some parts of SSM is programmed in c for maximum efficiency, before using SSM, go to the subfolder src and 
execute the script mexall to compile and distribute all needed c mex functions. 

4.2 Model construction 

To create an instance of a predefined model, call the SSMODEL (state space model) constructor with a coded string 
representing the model as the first argument, and optional arguments as necessary. For example: 

• model = ssmodel ( ' 11m' ) creates a local level model. 

• model = ssmodel {' arma' , p, q) creates an ARMA(p, g) model. 

The resulting variable model is a SSMODEL object and can be displayed just like any other MATLAB variables. 
To set or change the model parameters, use model .param, which is a row vector that behaves like a MATLAB 
matrix except its size cannot be changed. The initial conditions usually defaults to exact diffuse initialization, where 
model . al is zero, and model .PI is oo on the diagonals, but can likewise be changed. Models can be combined 
by horizontal concatenation, where only the observation disturbance model of the first one will be retained. The class 
SSMODEL will be presented in detail in later sections. 

4.3 Model and state estimation 

With the model created, estimation can be performed. SSM expects the data y to be a matrix of dimensions px n, where 
n is the data size (or time duration). The model parameters are estimated by maximum likelihood, the SSMODEL 
class method estimate performs the estimation. For example: 

• modell = estimate (y, modelO) estimates the model and stores the result in modell, where the pa- 
rameter values of mode 10 is used as initial value. 

• [modell logL] = estimate (y, modelO, psiO, [], optnamel, 

optvaluel, optname2, optvalue2, ...) estimates the model with p s i as the initial parameters 
using option settings specified with option value pairs, and returns the resulting model and loglikelihood. 

After the model is estimated, state estimation can be performed, this is done by the SSMODEL class method 
kalman and statesmo, which is the Kalman filter and state smoother respectively. 

• [a P] = kalman (y, model) applies the Kalman filter on y and returns the one-step-ahead state predic- 
tion and variance. 

• [alphahat V] = statesmo (y, model) applies the state smoother on y and returns the expected state 
mean and variance. 

The filtered and smoothed state sequences a and alphahat are m x n+1 and m x n matrices respectively, and the 
filtered and smoothed state variance sequences P and V are m x m x n+1 and m x m x n matrices respectively, except 
if m = 1, in which case they are squeezed and transposed. 
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5 SSM Classes 

SSM is composed of various MATLAB classes each of which represents specific parts of a general state space model. 
The class SSMODEL represents a state space model, and is the most important class of SSM. It embeds the classes 
SSMAT, SSDIST, SSFUNC and SSPARAM, all designed to facihtate the consti'uction and usage of SSMODEL for 
general data analysis. Consult appendix |B] for complete details about these classes. 

5.1 Class SSMAT 

The class state space matrix (SSMAT) forms the basic components of a state space model, they can represent linear 
transformations that map vectors to vectors (the Zt, Tt and Rt matrices), or variance matrices of Gaussian disturbances 
(the Ht and Qt matrices). 

The first data member of class SSMAT is mat, which is a matrix that represents the stationary part of the state 
space matrix, and is often the only data member that is needed to represent a state space matrix. For dynamic state 
space matrices, the additional data members dmmask and dvec are needed, dmmask is a logical matrix the same 
size as mat, and specifies elements of the state space matrix that varies with time, these elements arranged in a column 
in column-wise order forms the columns of the matrix dvec, thus nnz (dmmask) is equal to the number of rows in 
dvec, and the number of columns of dvec is the number of time points specified for the state space matrix. This is 
designed so that the code 

mat (dmmask) = dvec ( : , t ) ; 

gets the value of the state space matrix at time point t . 

Now these are the constant part of the state space matrix, in that they do not depend on model parameters. The 
logical matrix mmask specifies which elements of the stationary mat is dependent on model parameters (variable), 
while the logical column vector dvmask specifies the variable elements of dvec. Functions that update the state space 
matrix take the model parameters as input argument, and should output results so that the following code correctly 
updates the state space matrix: 

mat (mmask) = funcl(psi); 
dvec (dvmask, :) = func2(psi); 

where fund update the stationary part, and func2 updates the dynamic part. The class SSMAT is designed to 
represent the "structure" and current value of a state space matrix, thus the update functions are not part of the class 
and will be described in detail in later sections. 

Objects of class SSMAT behaves like a matrix to a certain extent, calling size returns the size of the stationary 
part, concatenations horzcat, vertcat and blkdiag can be used to combine SSMAT with each other and ordi- 
nary matrices, and same for plus. Data members mat and dvec can be read and modified provided the structure of 
the state space matrix stays the same, parenthesis indexing can also be used. 

5.2 Class SSDIST 

The class state space distributions (SSDIST) is a child class of SSMAT, and represents non-Gaussian distributions for 
the observation and state disturbances {Ht and Qt). Since disturbances are vectors, it's possible to have both elements 
with Gaussian distributions, and elements with different non-Gaussian distributions. Also in order to use the Kalman 
filter, non-Gaussian distributions are approximated by Gaussian noise with dynamic variance. Thus the class SSDIST 
needs to keep track of multiple non-Gaussian distributions and also their corresponding disturbance elements, so that 
the individual Gaussian approximations can be concatenated in the correct order. 
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As non-Gaussian distributions are approximated by dynamic Gaussian distribution in SSM, its representation need 
only consist of data required in the approximation process, namely the functions that take disturbance samples as input 
and outputs the dynamic Gaussian variance. Also a function that outputs the probability of the disturbance samples 
is needed for loglikelihood calculation. These functions are stored in two data members matf and logpf , both cell 
arrays of function handles. A logical matrix diagmask is needed to keep track of which elements each non-Gaussian 
distribution governs, where each column of diagmask corresponds to each distribution. Elements not specified in 
any columns are therefore Gaussian. Lastiy a logical vector dmask of the same length as matf and logpf specifies 
which distributions are variable, for example, t-distributions can be variable if the variance and degree of freedom are 
model parameters. The update function func for SSDIST objects should be defined to take parameter values as input 
arguments and output a cell mattix of function handles such that 

distf = func(psi); 
[matf {dmask}] = distf{:, l}; 
[logpf {dmask}] = distf{:, 2}; 

correctly updates the non-Gaussian distributions. 

In SSM most non-Gaussian distributions such as exponential family distributions and some heavy-tailed noise 
distributions are predefined and can be constructed similarly to predefined models, these can then be inserted in stock 
components to allow for non-Gaussian disturbance generaUzations. 

5.3 Class SSFUNC 

The class state space functions (SSFUNC) is another child class of SSMAT, and represents nonlinear functions that 
transform vectors to vectors (Zt, T-i and Rt). As with non-Gaussian distribution it is possible to "concatenate" non- 
linear functions corresponding to different elements of the state vector and destination vector. The dimension of input 
and output vector and which elements they corresponds to will have to be kept track of for each nonlinear function, 
and their linear approximations are concatenated using these information. 

Both the function itself and its derivative are needed to calculate the linear approximation, these are stored in 
data members f and df , both cell arrays of function handles, horzmask and vertmask are both logical matrices 
where each column specifies the elements of the input and output vector for each function, respectively. In this 
way the dimensions of the approximating matrix for the ith function can be determined as nnz (vertmask ( : , 
i ) ) X nnz (horzmask ( : , i ) ) . The logical vector fmask specifies which functions are variable with respect to 
model parameters. The update function func for SSFUNC objects should be defined to take parameter values as input 
arguments and output a cell matrix of function handles such that 

funcf = func(psi); 
[f{fmask}] = funcf{:, l}; 
[df{fmask}] = funcf{:, 2}; 

correctly updates the nonlinear functions. 

5.4 Class SSPARAM 

The class state space parameters (SSPARAM) stores and organizes the model parameters, each model parameter has 

a name, value, transform and possible constraints. Transforms are necessary because most model parameters have a 
restricted domain, but most numerical optimization routines generally operate over the whole real line. For example 
a variance parameter cr^ are suppose to be positive, so if ct^ is optimized directly there's the possibility of error, but 
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by defining ip = log(o-^) and optimizing tp instead, this problem is avoided. Model parameter transforms are not 
restricted to single parameters, sometimes a group of parameters need to be transformed together, the same goes for 
constraints. Hence the class SSPARAM also keeps track of parameter grouping. 

Like SSMAT, SSPARAM objects can be treated like a row vector of parameters to a certain extent. Specifically, it is 
possible to horizontally concatenate SSPARAM objects as a means of combining parameter groups. Other operations 
on SSPARAM objects include getting and setting the parameter values and transformed values. The untransformed 
values are the original parameter values meaningful to the model defined, and will be referred to as param in code 
examples; the transformed values are meant only as input to optimization routines and have no direct interpretation in 
terms of the model, they are referred to as psi in code examples, but the user will rarely need to use them directly if 
predefined models are used. 

5.5 Class SSMODEL 

The class state space model (SSMODEL) represents state space models by embedding SSMAT, SSDIST, SSFUNC 
and SSPARAM objects and also keeping track of update functions, parameter masks and general model component 
information. The basic constituent elements of a model is described earlier as Z, T, R, H, Q, c, al and PI. Z, T and 
R are vector transformations, the first two can be SSFUNC or SSMAT objects, but the last one can only be a SSMAT 
object. H and Q govern the disturbances and can be SSDIST or SSMAT objects, c, al and PI can only be SSMAT 
objects. These form the "constant" part of the model specification, and is the only part needed in Kalman filter and 
related state estimation algorithms. 

The second part of SSMODEL concems the update functions and model parameters, general specifications for the 
model estimation process, f unc and grad are cell arrays of update functions and their gradient for the basic model 
elements respectively. It is possible for a single function to update multiple parts of the model and these information 
are stored in the data member A, a length ( f unc ) x 1 8 adjacency matrix. Each row of A represents one update 
function, and each colunm represents one updatable element. The 1 8 updatable model elements in order are 

H Z T R Q c al PI Hd Zd Id Rd Qd cd Hng Qng Znl Tnl 

where the d suffix means dynamic, ng means non-Gaussian and nl means nonlinear. It is therefore possible for a 
function that updates Z to output [vec dsubvec funcf] updating the stationary part of Z, dynamic part of Z and 
the nonlinear functions in Z respectively. The row in A for this function would be 

H Z T R Q c al PI Hd Zd Td Rd Qd cd Hng Qng Znl Tnl 
01000000010000 1 

Any function can update any combination of these 18 elements, the only restriction is that the output should follow 
the order set in A with possible omissions. 

On the other hand A also allows many functions to update a single model element, this is the main mechanism that 
enables smooth model combination. Suppose Tl of model 1 is updated by fund, and T2 of model 2 by func2, 
combination of models 1 and 2 requires block diagonal concatenation of Tl and T2: T = blkdiag ( Tl , T2 ) . It 
is easy to see that the vertical concatenation of the output of fund and f unc2 correctly updates the new matrix T 
as follows 



vecl = fund (psi) 
vec2 = func2(psi) 
T .mat (T .mmask) = [vecl; 



vec2 ] ; 
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This also holds for horizontal concatenation, but for vertical concatenation of more than one column this would fail, 
luckily this case never occurs when combining state space models, so can be safely ignored. The updates of dynamic, 
non-Gaussian and nonlinear elements can also be combined in this way, in fact, the whole scheme of the update 
functions are designed with combinable output in mind. For an example adjacency matrix A 

H Z T R Q c al PI Hd Zd Td Rd Qd cd Hng Qng Znl Tnl 
fund 10101000000000 1 
func2 00100000010000 
func3 01101000010000 1 

model update would then proceed by first invoking the three update functions 

[Hvecl Tvecl Qvecl Zfuncfl] = funcl(psi); 
[Tvec2 Zdsubvec2] = func2 (psi) ; 
[Zvec3 Tvec3 QvecS Zdsubvec3 ZfuncfS] = func3(psi); 

and then update the various elements 

H . mat (H . mmask) = Hvecl; 
Z . mat ( Z . mmask) = Zvec3; 
T .mat (T .mmask) = [Tvecl; Tvec2; Tvec3]; 
Q .mat (Q .mmask) = [Qvecl; QvecS] ; 
Z . dvec (Z . dvmask, :) = [Zdsubvec2; Zdsubvec3]; 
Z.f(Z.fmask) = [Zfuncfl{:, l}; Zfuncf3{:, l}] ; 
Z.df (Z.fmask) = [Zfuncfl{:, 2}; Zfuncf3{:, 2}]; 

Now to faciUtate model construction and alteration, functions that need adjacency matrix A as input actually 
expects another form based on strings. Each of the 18 elements shown earlier is represented by the corresponding 
strings, ' H' , ' Qng' , ' Zd' , . . . etc., and each row of A is a single string in a cell array of strings, where each string is 
a concatenation of various elements in any order. This form (a cell array of strings) is called "adjacency string" and is 
used in all functions that require an adjacency matrix as input. 

The remaining data members of SSMODEL are psi and pmask. psi is a SSPARAM object that stores the model 
parameters, and pmask is a cell array of logical row vectors that specifies which parameters are required by each up- 
date function. So each update function are in fact provided a subset of model parameters funci (psi .value (pmask{ 
It is trivial to combine these two data members when combining models. 

In summary the class SSMODEL can be divided into two parts, a "constant" part that corresponds to an instance 
of a theoretical state space model given all model parameters, which is the only part required to run Kalman filter 
and related state estimation routines, and a "variable" part that keeps track of the model parameters and how each 
parameter effects the values of various model elements, which is used (in addition to the "constant" part) in model 
estimation routines that produce parameter estimates. This division is the careful separation of the "structure" of a 
model and its "instantiation" given model parameters, and allows logical separation of various structural elements 
of the model (with the classes SSMAT, SSDIST and SSFUNC) to facilitate model combination and usage, without 
compromising the abiUty to efficiently define and combine complex model update mechanisms. 



6 SSM Functions 



There are several types of functions in SSM besides class methods, these are data analysis functions, stock element 
functions and various helper functions. Most data analysis functions are Kalman filter related and implemented as 
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SSMODEL methods, but for the user this is transparent since class methods and functions are called in the same way 
in MATLAB. Examples for data analysis functions include kalman, statesmo and loglik, all of which uses the 
kalman filter and outputs the filtered states, smoothed states and loglikelihood respectively. These and similar functions 
take three arguments, the data y, which must be p x n, where p is the number of variables in the data and n the data 
size, the model model, a SSMODEL object, and optional analysis option name value pairs (see function setopt in 



Appendix C.4l, where settings such as tolerance can be specified. The functions linear and gauss calculates the 
linear Gaussian approximation models, and requires y, model and optional initial state estimate alphaO as input 
arguments. The function estimate runs numerical optimization routines for model estimation; model updates, 
necessary approximations and loglikelihood calculations are done automatically in the procedure, input arguments 
are y, model and initial parameter estimate psiO, the function outputs the maximum hkelihood model. There are 
also some functions specific to ARlMA-type models, such as Hillmer-Tiao decomposition IJl and TRAMO model 
selection [3|. 

Stock element functions create individual parts of predefined models, they range from the general to the specific. 
These functions have names of the form (element) _ (description) , for example, the function mat_arma 
creates matrices for the ARMA model. Currently there are five types of stock element functions: fun_* creates 
stationary or dynamic matrix update functions, mat_* creates matrix structures, ngdist_* creates non-Gaussian 
distributions (or the SSM representation of which), ngf un_* creates non-Gaussian distribution update functions, and 
x_* creates stock regression variables such as trading day variables. Note that these functions are created to be class 
independent whenever possible, in that they do not directly construct or output SSM class objects, although the form 
of the output may imply the SSM class structures (an exception is that fun_* functions return a SSPARAM object). 
This is primarily due to modular considerations and also that these "utility" functions can then be useful to users who 
(for whatever reason) do not wish to use the SSM classes. 

Miscellaneous helper functions provide additional functionality unsuitable to be implemented in the previous two 
categories. The most important of which is the function setopt, which specifies the global options or settings 
structure that is used by almost every data analysis function. One time overrides of individual options are possible 
for each function call, and each option can be individually specified, examples of possible options include tolerance 
and verbosity. Other useful functions include ymarray which generates a year, month array from starting year, 
starting month and number of months and is useful for functions like x_td and x_ee which take year, month arrays as 
input, randarma which generates a random AR or MA polynomial, and meancov which calculates the mean and 
covariance of a vector sequence. 



7 Data Analysis Examples 

Many of the data analysis examples are based on the data in the book by Durbin and Koopman [[T], to facilitate easy 
comparison and understanding of the method used here. 



7.1 Structural time series models 

In this example data on road accidents in Great Britain ID is analyzed using structural time series models following 
Q. The purpose the the analysis is to assess the effect of seat belt laws on road accident casualties, with individual 
monthly figures for drivers, front seat passengers and rear seat passengers. The monthly price of petrol and average 
number of kilometers traveled will be used as regression variables. The data is from January 1969 to December 1984. 
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7.1.1 Univariate analysis 

The drivers series will be analyzed with univariate structural time series model, which consists of local level compo- 
nent /Ut, trigonometric seasonal component 74, regression component (on price of petrol) and intervention component 
(introduction of seat belt law) (3xt. The model equation is 

yt = Ht + lt + Pxt + st, (7) 
where is the observation noise. The following code example constructs this model: 

Ivl = ssmodel ( ' 11m' ) ; 

seas = ssmodel (' seasonal' , 'trigl', 12); 

intv = ssmodel (' intv' , n, 'step', 170); 

reg = ssmodel (' reg' , petrol, 'price of petrol'); 

bstsm = [Ivl seas intv reg] ; 

bstsm.name = 'Basic structural time series model'; 
bstsm. param = [10 0.1 0.001]; 

The MATLAB display for object bst sm is 

bstsm = 

Basic structural time series model 



P = 1 
m = 14 
r = 12 
n = 192 



Components (5) : 



[Gaussian noise] 

P = 1 

[local polynomial trend] 

d =0 
[ seasonal ] 

subtype = trigl 

s =12 
[ intervention] 

subtype = step 

tau = 170 
[regression] 

variable = price of petrol 



H matrix (1, 1) 
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10.0000 



Z matrix (1, 14, 192) 



Stationary part; 



110101010101 DYN DYN 



Dynamic part : 

Columns 1 through 9 

000000000 
-2.2733 -2.2792 -2.2822 -2.2939 -2.2924 -2.2968 -2.2655 -2.2626 -2.2655 

Columns 10 through 18 

000000000 
-2.2728 -2.2757 -2.2828 -2.2899 -2.2956 -2.3012 -2.3165 -2.3192 -2.3220 



Columns 181 through 189 

111111111 
-2.1390 -2.1646 -2.1565 -2.1597 -2.1644 -2.1648 -2.1634 -2.1646 -2.1707 

Columns 190 through 192 

111 
-2.1502 -2.1539 -2.1536 



T matrix (14, 14) 



Columns 1 through 9 



1 

0.8660 

-0.5000 









0.5000 
0.8660 









.5000 



-0.8660 








.8660 
0.5000 
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R matrix (14, 12) 
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Q matrix (12, 



12) 
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Columns 1 through 9 
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Columns 10 through 12 
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PI matrix (14, 14) 
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000000000000 InfO 
0000000000000 Inf 



Adjacency matrix (3 functions) : 



Function H Z T R Q c al P 1 Hd Zd Td Rd Qd cd Hng Qng Znl Tnl 

fun_var/psi2var 100000000000000 

fun_var/psi2var 000010000000000 

fun_dupvar/psi2dupvarl 000010000000000 

Parameters (3) 



Name Value 

epsilon var 10 

zeta var . 1 

omega var 0.001 



The constituting components of the model and the model matrices are all displayed, note that Z is dynamic with the 
intervention and regression variables, and matrices c and a 1 are omitted if they are zero, also H and Q take on values 
determined by the parameters set. 

With the model constructed, estimation can proceed with the code: 

[bstsm logL] = estimate (y, bstsm) ; 

[alphahat V] = statesmo (y, bstsm); 
irr = disturbsmo (y, bstsm); 
ycom = signal (alphahat, bstsm); 
ylvl = sum (ycom ([1 3 4], :), 1); 
yseas = ycom(2, :); 

The estimated model parameters are [0.0037862 0.00026768 1.162e-006], which can be obtained by dis- 
playing bstsm. param, the loglikelihood is 175 . 7790. State and disturbance smoothing is performed with the 
estimated model, and the smoothed state is transformed into signal components by signal. Because p = 1, the out- 
put ycom is M X n, where M is the number of signal components. The level, intervention and regression are summed as 
the total data level, separating seasonal influence. Using MATLAB graphic functions the individual signal components 
and data are visualized: 

subplot (3, 1, 1), plot (time, y, 'r:', ' DisplayName' , 'drivers'), hold all, 
plot (time, ylvl, 'DisplayName', 'est. level'), hold off, 
title (' Level' ) , ylim([6.875 8]), legend (' show' ) ; 

subplot (3, 1, 2), plot (time, yseas), title (' Seasonal' ) , ylim([-0.16 0.28]); 

subplot (3, 1, 3), plot (time, irr), title (' Irregular' ) , ylim([-0.15 0.15]); 

The graphical output is shown in figure [T] The coefficient for the intervention and regression is defined as part of the 
state vector in this model, so they can be obtained from the last two elements of the smoothed state vector (due to the 
order of component concatenation) at the last time point. Coefficient for the intervention is alpliahat (13, end) 
= -0 . 23773, and coefficient for the regression (price of petrol) is alpliahat ( 14 , end) = -0 . 2 914. In this 
way diagnostics of these coefficient estimates can also be obtained by the smoothed state variance V. 
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Level 




68 70 72 74 76 78 80 82 84 

Seasonal 




68 70 72 74 76 78 80 82 84 



Irregular 




Figure 1 : Driver casualties estimated by basic structural time series model 
7.1.2 Bivariate analysis 

The front seat passenger and rear seat passenger series will be analyzed together using bivariate structural time series 
model, with components as before. Specifically, separate level and seasonal components are defined for both series, but 
the disturbances are assumed to be correlated. To reduce the number of parameters estimated the seasonal component 
is assumed to be fixed, so that the total number of parameters is six. We also include regression on the price of petrol 
and kilometers traveled, and intervention for only the first series, since the seat belt law only effects the front seat 
passengers. The following is the model construction code: 

bilvl = ssmodel ( ' mvllm' , 2); 

biseas = ssmodel (' mvseasonal' , 2, [], 'trig fixed', 12); 
biintv = ssmodel (' mvintv' , 2, n, {'step' 'null'}, 170); 
bireg = ssmodel (' mvreg' , 2, [petrol; km]); 
bistsm = [bilvl biseas biintv bireg] ; 

bistsm.name = 'Bivariate structural time series model'; 

The model is then estimated with carefully chosen initial values, and state smoothing and signal extraction proceeds 
as before: 

[bistsm logL] = estimate(Y2, bistsm, [0.0054 0.0086 0.0045 0.00027 0.00024 0.00023]); 

[alphahat V] = statesmo(y2, bistsm); 
Y2com = signal (alphahat, bistsm); 
y21v1 = sum(y2com( : , : , [1 3 4]), 3); 
y2seas = y2com(:,:, 2); 

When p > 1 the output from signal is of dimension p x n x M, where M is the number of components. The 
level, regression and intervention are treated as one component of data level as before, separated from the seasonal 
component. The components estimated for the two series is plotted: 

subplot (2, If 1), plot (time, y21vl(l, :), ' DisplayName' , 'est. level'), hold all, 
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Front seat passenger level 



est. level 




68 70 72 74 76 78 80 82 84 



Rear seat passenger level 




68 70 72 74 76 78 80 82 84 



Figure 2: Passenger casualties estimated by bivariate structural time series model 

scatter (time, y2(l, 8, 'r', 's', 'filled', ' DisplayName' , 'front seat'), hold off, 

title('Front seat passenger level'), xlim([68 85]), ylim([5 7.25]), legend (' sliow' ) ; 
subplot (2, 1, 2), plot (time, y21vl(2, :), ' DisplayName' , 'est. level'), l^old all, 

scatter (time, y2(2, :), 8, 'r', 's', 'filled', 'DisplayName', 'rear seat'), liold off, 
title('Rear seat passenger level'), xlim([68 85]), ylim([5.375 6.5]), legend (' sl^ow' ) ; 

The graphical output is shown in figure [2] The intervention coefficient for the front passenger series is obtained 
by alphahat (25, end) = -0 . 30 025, the next four elements are the coefficients of the regression of the two 
series on the price of petrol and kilometers traveled. 

7.2 ARMA models 

In this example the difference of the number of users logged on an internet server |5 1 is analyzed by ARMA models, 
and model selection via BIC and missing data analysis are demonstrated. To select an appropriate ARMA(p, q) model 
for the data various values for p and q are tried, and the BIC of the estimated model for each is recorded, the model 
with the lowest BIC value is chosen. 

for p = : 5 

for q = : 5 

arma = ssmodel ( ' arma' , p, q) ; 
[arma logL output] = estimate (y, arma, 0.1); 
BIC(p+l, q+1) = output. BIC; 

end 

end 

[m i] = min (BIC ( : ) ) ; 

arma = estimate (y, ssmodel (' arma' , mod(i-l, 6), f loor ( ( i-1 ) / 5 ) ) , 0.1); 
The BIC values obtained for each model is as follows: 
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Figure 3: Forecast using ARMA(1, 1) model with 50% confidence interval 
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The model with the lowest BIC is ARMA(1, 1), second lowest is ARMA(3, 0), the former model is chosen for subse- 
quent analysis. 

Next missing data is simulated by setting some time points to NaN, model and state estimation can still proceed 
normally with missing data present. 

Y([6 16 26 36 46 56 66 72 73 74 75 76 86 96]) = NaN; 

arma = estimate (y, arma, 0.1); 

yf = signal (kalman (y, arma), arma); 

Forecasting is equivalent to treating future data as missing, thus the data set y is appended with as many NaN 
values as the steps ahead to forecast. Using the previous estimated ARMA(1, 1) model the Kalman filter will then 
effectively predict future data points. 

[a P V F] = kalman ([y repmat (NaN, 1, 20)], arma); 

yf = signal (a(:, l:end-l), arma); 

conf50 = . 675*realsqrt (F) ; oonf50(l:n) = NaN; 

plot(yf), holdall, plot ( [yf +conf 50 ; yf-conf50]', 'g:'), 

scatter (1 : length (y) , y, 10, ' r' , 's', 'filled'), hold off, ylim([-15 15]); 

The plot of the forecast and its confidence interval is shown in figure |3] Note that if the Kalman filter is replaced with 
the state smoother, the forecasted values will still be the same. 
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Spline and 95% conf dence interaals 
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Figure 4: Cubic spline smoothing of motorcycle acceleration data 



7.3 Cubic spline smoothing 

The general state space formulation can also be used to do cubic spline smoothing, by putting the cubic spline into 
an equivalent state space form, and accounting for the continuous nature of such smoothing procedures. Here the 
continuous acceleration data of a simulated motorcycle accident fE\ is smoothed by the cubic spline model, which is 
predefined. 

spline = estimate (y, ssmodel (' spline' , delta), [1 0.1]); 

[alphahat V] = statesmo (y, spline); 

conf95 = squeeze (1 . 96*realsqrt (V(l, 1, :)))'; 

[eps eta epsvar] = disturbsmo (y, spline); 

The smoothed data and standardized irregular is plotted and shown in figure]?] 

subplot (2, 1, 1), plot (time, alphahat (1, :), 'b'), hold all, 

plot (time, [alphahat (1, :) + conf 95; alphahat (1, :) - conf 95] , 'b:'), 

scatter (time, y, 10, ' r' , 's', 'filled'), hold off, 

title (' Spline and 95% confidence intervals'), ylim([-140 80]), 

set (gca, ' YGrid' , ' on' ) ; 

subplot (2, 1, 2), scatter (time, eps . /realsqrt (epsvar ) , 10, ' r' , 's', 'filled'), 
title (' Standardized irregular'), set (gca, ' YGrid' ,' on' ) ; 

It is seen from figure]4]that the irregular may be heteroscedastic, an easy ad hoc solution is to model the changing 
variance of the irregular by a continuous version of the local level model. Assume the irregular variance is a^h^ at 
time point t and hi = 1, then we model the absolute value of the smoothed irregular abs (eps ) with ht as its level. 
The continuous local level model with level ht needs to be constructed from scratch. 

contllm = ssmodel ('', 'continuous local level', ... 

0, 1, 1, 1, ssmat(0, [], true, zeros(l, n) , true), ... 

'Qd', {@(X) exp (2*X) *delta} , {[]}, ssparam ( { ' zeta var'}, '1/2 log')); 
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Figure 5: Cubic spline smoothing of motorcycle acceleration data with heteroscedastic noise 

contllm = [ ssmodel (' Gaussian' ) contllm] ; 

alphahat = statesmo (abs (eps ) , estimate (abs (eps) , contllm, [1 0.1])); 
h2 = (alphahat /alphahat ( 1 ))." 2 ; 

h2 is then the relative magnitude of the noise variance hf at each time point, which can be used to construct a custom 
dynamic observation noise model as follows. 

hetnoise = ssmodel ('', 'Heteroscedastic noise', ... 

ssmat(0, [], true, zeros (1, n) , true), zeros (1, 0), [], [], [], ... 

'Hd', {@(X) exp (2*X) *h2 } , {[]}, ssparam ( { ' epsilon var ' } , '1/2 log')); 
hetspline = estimate (y, [hetnoise spline], [1 0.1]); 
[alphahat V] = statesmo (y, hetspline); 
conf95 = squeeze ( 1 . 96*realsqrt (V ( 1 , 1, :)))'; 
[eps eta epsvar] = disturbsmo (y, hetspline); 

The smoothed data and standardized irregular with heteroscedastic assumption is shown in figure |5] it is seen that the 
confidence interval shrank, especially at the start and end of the series, and the irregular is slightly more uniform. 

In summary the motorcycle series is first modeled by a cubic spline model with Gaussian irregular assumption, 
then the smoothed irregular magnitude itself is modeled with a local level model. Using the irregular level estimated 
at each time point as the relative scale of irregular variance, a new heteroscedastic irregular continuous model is 
constructed with the estimated level built-in, and plugged into the cubic spline model to obtain new estimates for the 
motorcycle series. 



7.4 Poisson distribution error model 



The road accident casualties and seat belt law data analyzed in section 7. 1 also contains a monthly van driver casualties 
series. Due to the smaller numbers of van driver casualties the Gaussian assumption is not justified in this case. 
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data 
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Figure 6; Van driver casualties and estimated level 

previous methods cannot be applied. Here a poisson distribution is assumed for the data, the mean is exp{9t) and the 
observation distribution is 

p{yt\Ot) = exp (efyt - expiOt) - logyt!) 

The signal dt in turn is modeled with the structural time series model. The total model is then constructed by con- 
catenating a poisson distribution model with a standard structural time series model, the former model will replace the 
default Gaussian noise model. 

pbstsm = [ ssmodel (' poisson' ) ssmodel ( ' 11m' ) ... 

ssmodel (' seasonal' , 'dummy fixed', 12) ssmodel (' intv' , n, 'step', 170)]; 
pbstsm. name = 'Poisson basic STSM' ; 

Model estimation will automatically calculate the Gaussian approximation to the poisson model. Since this is an 
exponential family distribution the data yt also need to be transformed to yt, which is stored in the output argument 
output, and used in place of yt for all functions implementing linear Gaussian (Kalman filter related) algorithms. 
The following is the code for model and state estimation 

[pbstsm logL output] = estimate (y, pbstsm, 0.0006, [], ' fmin' , 'bfgs', 'disp', ' iter' ) ; 
[alphahat V] = statesmo (output . ytilde, pbstsm); 
thetacom = signal (alphahat, pbstsm); 

Note that the original data y is input to the model estimation routine, which also calculates the transform ytilde. 
The model estimated then has its Gaussian approximation built-in, and will be treated by the state smoother as a linear 
Gaussian model, hence the transformed data ytilde needs to be used as input. The signal components tlietacom 
obtained from the smoothed state alpliahat is the components of 9t, the mean of y can then be estimated by 
exp ( sum (thetacom, 1)). 

The exponentiated level exp{dt) with the seasonal component eliminated is compared to the original data in figure 
|6] The effect of the seat belt law can be clearly seen. 

thetalvl = thetacom (1, :) + thetacom (3, :); 

plot (time, exp (thetalvl ) , ' DisplayName' , 'est. level'), hold all, 

plot(time, y, 'r:', 'DisplayName', 'data'), hold off, ylim([l 18]), legend (' show' ) ; 
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Figure 7: t-distribution irregular 

7.5 t-distribution models 

In this example another kind of non-Gaussian models, t-distribution models is used to analyze quarterly demand for 
gas in UK Q. A structural time series model with a t-distribution heavy-tailed observation noise is constructed similar 
to the last section, and model estimation and state smoothing performed. 

tstsm = [ssmodel ( ' t' ) ssmodel ( ' lit ' ) ssmodel (' seasonal' , 'dummy', 4)]; 

tstsm = estimate(y, tstsm, [0.0018 4 7.7e-10 7 . 9e-6 0.0033], [], ' fmin' , 'bfgs', 'disp', ' iter' ) ; 
[alpha irr] = fastsmo(y, tstsm); 

plot (time, irr), title (' Irregular component'), xlim([1959 1988]), ylim([-0.4 0.4]); 

Since t-distribution is not an exponential family distribution, data transformation is not necessary, and y is used 
throughout. A plot of the irregular in figure |7] shows that potential outliers with respect to the Gaussian assumption 
has been detected by the use of heavy -tailed distribution. 



7.6 Hillmer-Tiao decomposition 

In this example seasonal adjustment is performed by the Hillmer-Tiao decomposition ||2l of airline models. The data 
(Manufacturing and reproducing magnetic and optical media, U.S. Census Bureau) is fitted with the airline model, then 
the estimated model is Hillmer-Tiao decomposed into an ARIMA components model with trend and seasonal compo- 
nents. The same is done for the generalized airline modeQ|8|, and the seasonal adjustment results are compared. The 
following are the code to perform seasonal adjustment with the airline model: 

air = estimate (y, ssmodel (' airline' ) , 0.1); 
aircom = ssmhtd(air); 

ycom = signal ( statesmo (y, aircom), aircom); 
airseas = yoom(2, :); 

aircom is the decomposed ARIMA components model corresponding to the estimated airline model, and airseas 
is the seasonal component, which will be subtracted out of the data y to obtain the seasonal adjusted series, ssmlitd 
automatically decompose ARIMA type models into trend, seasonal and irregular components, plus any extra MA 
components as permissible. 

The same seasonal adjustment procedure is then done with the generalized airline model, using parameter estimates 
from the airline model as initial parameters: 

'Currently renamed Frequency Specific ARIMA model. 
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Figure 8: Seasonal adjustment with airline and generalized airline models 



paramO = air . par am ( [ 1 2 2 3 ] ) ; paramO (1:3) = -paramO (1:3) ; paramO (2:3) = paramO (2:3) (1/12) ; 
ga = estimate (y, ssmodel ( ' genair' , 3, 5, 3), paramO); 
gacom = ssmhtd(ga); 

ycom = signal ( statesmo (y, gacom), gacom); 
gaseas = ycom (2, :); 

The code creates a generalized airline 3-5-1 model, Hillmer-Tiao decomposition produces the same components as for 
the airline model since the two models have the same order. From the code it can be seen that the various functions 
work transparently across different ARIMA type models. Figure [8] shows the comparison between the two seasonal 
adjustment results. 
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A Predefined Model Reference 

The predefined models can be organized into two categories, observation disturbance models and normal models. The 
former contains only specification of the observation disturbance, and is used primarily for model combination; the 
latter contains all other models, and can in turn be partitioned into structural time series models, ARIMA type models, 
and other models. 

The following is a list of each model, their string code and the arguments required to construct them via the 
constructor ssmodel. 

• Observation disturbance models 

- Gaussian noise: ^( ' Gaussian' | 'normal' [, p, cov] ) 




p is the number of variables, default 1. 

cov is true if they are correlated, default true. 

- Null noise: --(' null' [ , p] ) 

p is the number of variables, default 1. 

- Poisson error: ^ ( ' poisson' ) 

- Binary error: ^ ( ' binary' ) 

- Binomial error: ^ ('binomial' , k) 

k is the number of trials, can be a scalar or row vector 

- Negative binomial error: ^ ( ' negbinomial' , k) 
k is the number of trials, can be a scalar or row vector. 

- Exponential error: ^ ( ' exp' ) 

- Multinomial error: ~ (' multinomial' , h, k) 
h is the number of cells. 

k is the number of trials, can be a scalar or row vector 

- Exponential family error: ~ (' exp family ' , b, d2b, id2bdb, c) 



2007. 




p{y\9) = exp {y^e - b{9) + c{y)) 



b is the function b{6). 

d2b is b{9), the second derivative of b{9). 



^Substitute with name of relevant the class constructor. 
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id2bdbis 6(6l)-i6((9). 
c is the function c{y). 

- Zero-mean stochastic volatility error: ~ ( ' zmsv' ) 

- t-distribution noise: ( ' t ' [ , nu ] ) 

nu is the degree of freedom, will be estimated as model parameter if not specified. 

- Gaussian mixture noise: (' mix' ) 

- General error noise: ~ (' error' ) 

• Structural time series models 

- Integrated random walk: ^ ( ' irw' , d) 
d is the order of integration. 

- Local polynomial trend: ( ' Ipt ' , d) 

d is the order of the polynomial. 

- Local level model: ~ ( ' 11m' ) 

- Local level trend: ( ' lit ' ) 

- Seasonal components: (' seasonal' , type, s) 

type can be ' dummy' , ' dummy fixed' , ' h&s' , ' trigl' , ' trig2 ' or ' trig fixed' . 
s is the seasonal period. 

- Cycle component: ( ' cycle' ) 

- Regression components: ( ' reg' , x [ , varname] ) 

- Dynamic regression components: ( ' dynreg' , x [ , varname ] ) 
X is a m X n matrix, m is the number of regression variables, 
varname is the name of the variables. 

- Intervention components: (' intv' , n, type, tau) 
n is the total time duration. 

type can be ' step' , ' pulse' , ' slope' or ' null' . 
tau is the onset time. 

- Constant components: (' constant' ) 

- leading day variables: ^ (' td6' I 'tdl', y, m, N) 
' td6' creates a six-variable trading day model. 

' tdl ' creates a one-variable trading day model. 

- Length-of-month variables: ~ (' lom' , y, m, N) 

- Leap-year variables: (' ly' , y, m, N) 

- Easter effect variables: ~ (' ee' , y, m, N, d) 

y is the starting year. 

m is the starting month. 

N is the total number of months. 

d is the number of days before Easter. 
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- Structural time series models: (' St sm' , Ivl, seas, s[, cycle, x] ) 
Ivl is ' level' or ' trend' . 

seas is the seasonal type (see seasonal components), 
s is the seasonal period. 

cycle is true if there's a cycle component in the model, default false. 
X is explanatory (regression) variables (see regression components). 

- Common levels models: ~ (' commonlvls' , p, A_ast, a_ast) 












Vt = 




+ 




a* 




A* 



p is the number of variables (length of yt). 
A_ast is A*, & {p — r) x r matrix. 
a_ast is a*, a (p — r) x 1 vector. 

- Multivariate local level models: ( ' mvllm' , p [ , cov] ) 

- Multivariate local level trend: ( ' mvllt ' , p [ , cov] ) 

- Multivariate seasonal components: ~ (' mvseasonal' , p, cov, type, s) 

- Multivariate cycle component: ( ' mvcycle' , p [ , cov] ) 
p is the number of variables. 

cov is a logical vector that is true for each correlated disturbances, default all true. 
Other arguments are the same as the univariate versions. 

- Multivariate regression components: ^ ( ' mvreg' , p, x [ , dep] ) 

dep is a p X m logical matrix that specifies dependence of data variables on regression variables. 

- Multivariate intervention components: ^ (' mvintv' , p, n, type, tau) 

- Multivariate structuraltime series models: ~ ('mvstsm' , p, cov, Ivl, seas, s[, cycle 
X, dep] ) 

cov is a logical vector that specifies the covariance structure of each component in turn. 
• ARIMA type models 

- ARMA models: ~ (' arma' , p, q, mean) 

- ARIMA models: ~ (' arima' , p, d, q, mean) 

- Multiplicative seasonal ARIMA models: ~ (' sarima' , p, d, q, P, D, Q, s, mean) 

- Seasonal sum ARMA models: ~ (' sumarma' , p, q, D, s, mean) 

- SARIMA with Hillmer-Tiao decomposition: ~ (' sarimahtd' , p, d, q, P, D, Q, s, gau 

p is the order of AR. 

d is the order of I. 

q is the order of MA. 

P is the order of seasonal AR. 

D is the order of seasonal I. 

Q is the order of seasonal MA. 

s is the seasonal period. 

mean is true if the model has mean. 

gauss is true if the irregular component is Gaussian. 
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- Airline models: ^('airline' [, s] ) 
s is the period, default 12. 

- Generalized airline models: ^ (' genair' , nparam, nfreq, freq) 

nparam is the number of parameters, 3 or 4. 

nfreq is the size of the largest subset of frequencies sharing the same parameter, 
freq is an array containing the members of the smallest subset. 

- ARIMA component models: ^ (' arimacom' , d, D, s, phi, theta, ksivar) 
The arguments match those of the function htd, see its description for details. 

• Other models 

- Cubic spline smoothing (continuous time): ^ ( ' spline' , delta) 
delta is the time duration of each data point. 

- 1/f noise models (approximated by AR) :~('l/f noise', m) 
m is the order of the approximating AR process. 

B Class Reference 

B.l SSMAT 

The class SSMAT represents state space matrices, can be stationary or dynamic. SSMAT can be horizontally, vertically, 
and block diagonally concatenated with each other, as well as with ordinary matrices. Parenthesis reference can also 
be used, where two or less indices indicate the stationary part, and three indices indicates a 3-d matrix with time as the 
third dimension. 

B.1.1 Subscripted reference and assignment 



. mat 


R^ 


Stationary part of SSMAT 


Assignment cannot change size. 


. mma s k 


R 


Variable mask 


A logical matrix the same size as mat. 


. n 


R 


Time duration 


Is equal to size (dvec, 2). 


. dmmask 


R 


Dynamic mask 


A logical matrix the same size as mat. 


.d 


R 


Number of dynamic elements 


Is equal to size (dvec, 1). 


. dvec 


RA 


Dynamic vector sequence 


A d X n matrix. Assignment cannot change d. 


. dvmask 


R 


Dynamic vector variable mask 


A d X 1 logical vector 


. const 


R 


True if SSMAT is constant 


A logical scalar. 


. sta 


R 


True if SSMAT is stationary 


A logical scalar. 


(i, j) 


RA 


Stationary part of SSMAT 


Assignment cannot change size. 


(i, j, t) 


RA 


SSMAT as a 3-d matrix 


Assignment cannot change size. 



B.1.2 Class methods 

• ssmat 

SSMAT constructor. 

SSMAT (m [ , mmask] ) creates a SSMAT. m can be a 2-d or 3-d matrix, where the third dimension is time, 
mmask is a 2-d logical matrix masking the variable part of matrix, which can be omitted or set to [ ] for 

- reference, A - assignment. 
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constant SSMAT. 

SSMAT (m, mmask, dminask[, dvec, dvmask] ) creates a dynamic SSMAT. m is a 2-d matrix forming 
the stationary part of the SSMAT. mmask is a logical matrix masking the variable part of matrix, which can 
be set to [ ] for constant SSMAT. dmmask is a logical matrix masking the dynamic part of SSMAT. dvec is 
a nnz (dmmask) x n matrix, dvec ( : , t) is the values for the dynamic part at time t (m (dmmask) = 
dvec ( : , t) for time t), default is a nnz (dmmask) x 1 zero matrix, dvmask is a nnz (dmmask) x 1 
logical vector masking the variable part of dvec. 

• isconst 

I SCON ST ( 1^ returns true if the stationary part of SSMAT is constant. 

• isdconst 

ISDCONST (•) returns true if the dynamic pait of SSMAT is constant. 

• issta 

I S S T A ( • ) returns true if S SMAT is stationary. 

• size 

[ ... ] =SIZE(-, ...) is equivalent to [ ... ] = SIZE (-.mat, ...). 

• getmat 

GETMAT ( • ) returns SSMAT as a matrix if stationary, or as a cell array of matrices if dynamic. 

• getdvec 

GETDVEC ( • , mmask ) returns the dynamic elements specified in matrix mask mmask as a vector sequence. 

• getn 

GETN ( • ) returns the time duration of SSMAT. 

• setmat 

SETMAT(-, vec) updates the variable Stationary matrix, vec must have size nnz (mmask) x 1. 

• setdvec 

SETDVEC(-, dsubvec) updates the variable dynamic vector sequence, dsubvec musthave nnz (dvmask) 
rows. 

• plus 

SSMAT addition. 



B.2 SSDIST 

The class SSDIST represents state space distributions, a set of non-Gaussian distributions that governs a disturbance 
vector (which can also have Gaussian elements), and is a child class of SSMAT. SSDIST are constrained to be square 
matrices, hence can only be block diagonally concatenated, any other types of concatenation will ignore the non- 
Gaussian distribution part. Predefined non-Gaussian distributions can be constructed via the SSDIST constructor. 

"^An instance of the class in question. 
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B.2.1 Subscripted reference and assignment 



nd 


R 


Number of distributions 


A scalar. 


type 


R 


Type of each distribution 


A 0-1 row vector. 


matf 


R 


Approximation functions 


A cell array of functions. 


logpf 


R 


Log probability functions 


A cell array of functions. 


diagmask 


R 


Non-Gaussian masks 


A size ( mat , 1 ) x nd logical matrix. 


dmask 


R 


Variable mask for the distributions 


A 1 X nd logical vector. 


const 


R 


True if SSDIST is constant 


A logical scalar. 



B.2.2 Class methods 

• ssdist 

SSDIST constructor. 

SSDIST ( ' ' , type [ , matf, logpf] ) creates a single univariate non-Gaussian distribution, type is 
for exponential family distributions and 1 for additive noise distributions, mat f is the function that generates the 
approximated Gaussian variance matrix given observations and signal or disturbances, logpf is the function 
that calculates the log probabiUty of observations given observation and signal or disturbances. If the last two 
arguments are omitted, then the SSDIST is assumed to be variable. SSDIST with multiple non-Gaussian 
distributions can be formed by constructing a SSDIST for each and then block diagonally concatenating them. 

• isdistconst 

ISDISTCONST (• ) retums true if the non-Gaussian part of SSDIST is constant. 

• setdist 

SETD I ST ( • , di St f ) updates the non-Gaussian distributions, di st f is a cell matrix of functions. 

• setgauss 

SETGAUSS (•, eps) or 

[• ytilde] = SETGAUSS (•, y, theta) calculates the approximating Gaussian covariance, the first 
form is used if there are no exponential family distributions, and the data y need to be transformed into y t i Ide 
in the second form if there are. 

• logprobrat 

LOGPROBRAT (•, N, eps) or 

LOGPROBRAT (•, N, y, theta, eps) calculates the log probabiUty ratio between the original non-Gaussian 
distribution and the approximating Gaussian distribution of the observations, it is used when calculating the 
non-Gaussian logUkeUhood by importance sampUng. The first form is used if there are no exponential family 
distributions. N is the number of importance samples. 

B.2.3 Predefined non-Gaussian distributions 

• Poisson distribution: ('poisson' ) 

• Binary distribution: ( ' binary' ) 

• Binomial distribution: ~ ( ' binomial' , k) 

• Negative binomial distribution: ^ ( ' negbinomial' , k) 

k is the number of trials, can be a scalar or row vector. 
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• Exponential distribution: ( ' exp' ) 

• Multinomial distribution: ~ (' multinomial' , h, k) 
h is the number of cells. 

k is the number of trials, can be a scalar or row vector. 

• General exponential family distribution: ~ (' expf ami ly' , b, d2b, id2bdb, c) 

p{y\0) = e^p{y^e-b{9)+c{y)) 

b is the function b{d). 

d2b is b{6), the second derivative of b{0). 

id2bdbis b{0)-^b{9). 

c is the fimction c{y) . 

B.3 SSFUNC 

The class SSFUNC represents state space functions, a set of nonlinear functions that describe state vector transfor- 
mations, and is a child class of SSMAT. Since linear transformations in SSM are represented by left multiplication of 
matrices, it is only possible for nonlinear functions (and hence it's matrix approximation) to be either block diagonally 
or horizontally concatenated. The former concatenation method produces multiple resulting output vectors, whereas 
the latter produces the sum of the output vectors of all functions as output. Vertical concatenation will ignore the 
nonlinear part of SSFUNC. 

B.3.1 Subscripted reference and assignment 



nf 


R 


Number of functions 


A scalar. 




f 


R 


The nonlinear functions 


A cell array of functions. 




df 


R 


Derivative of each function 


A cell array of functions. 




horzmask 


R 


Nonlinear function output masks 


A size (mat, 1) xnfloj 


^ical matrix. 


vertmask 


R 


Nonlinear function input masks 


A size (mat, 2) xnfloj 


'ical matrix. 


fmask 


R 


Variable mask for the functions 


A 1 X nf logical vector. 




const 


R 


True if SSFUNC is constant 


A logical scalar. 





B.3.2 Class methods 

• ssfunc 

SSFUNC constructor. 

S SFUNC ('', p, m[, f, df]) creates a single nonlinear function, p is the input vector dimension and m 
is the output vector dimension. Hence the approximating linear matrix will be p x m. f and df is the nonlinear 
function and its derivative, f maps m x 1 vectors and time t to p x 1 vectors, and df maps m x 1 vectors and 
time t to p X m matrices. If f and df are not both provided, the SSFUNC is assumed to be variable. SSFUNC 
with multiple functions can be constructed by defining individual SSFUNC for each function, and concatenating 
them. 

• isfconst 

ISFCONST (• ) returns true if the nonlinear part of SSFUNC is constant. 
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• getfunc 

GETFUNC(-, X, t) returns f (x, t ), the transform of x at time t . 

• getvertmask 

GETVERTMASK (•) retums vertmask. 

• setfunc 

SETFUNC ( • , f uncf ) updates the nonlinear functions, f uncf is a cell matrix of fimctions. 

• setlinear 

[• c] = SETLINEAR (•, alpha) calculates the approximating linear matrix, c is an additive constant in 
the approximation. 



B.4 SSPARAM 

The class SSPARAM represents state space model parameters. SSPARAM can be seen as a row vector of parameter 
values, hence they are combined by horizontal concatenation with a sUght difference, as described below. 



B.4.1 Subscripted reference and assignment 

. w R Number of parameters A scalar. 

. name R Parameter names A cell array of strings. 

. value RA Transformed parameter values A 1 x w vector. Assigimient cannot change size. 

B.4.2 Class methods 

• ssparam 

SSPARAM constructor. 

SSPARAM(w[, transform]) creates a SSPARAM with w parameters. 

SSPARAM (name [ , transform] ) creates a SSPARAM with parameter names name, which is a cell array 
of strings. 

transform is a cell array of strings describing transforms used for each parameter. 

SSPARAM (name, group, transform) creates a SSPARAM with parameter names specified by name, 
group is a row vector that specifies the number of parameters included in each parameter subgroup, and 
transform is of the same length which specifies transforms for each group. 

• get 

GET ( • ) retums the untransformed parameter values. 

• set 

S E T ( • , V a 1 u e ) sets the untransformed parameter values . 

• remove 

REMOVE ( • , ma s ]c ) removes the parameters specified by ma s Ic . 

• horzcat 

[• pmasl^] = HORZCAT(-i, -2, ...) combines multiple SSPARAM into one, and optionally generates a 
cell array of parameter masks pmaslt, masking each original parameter sets with respect to the new combined 
parameter set. 
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B.5 SSMODEL 

The class SSMODEL represents state space models and embeds all the previous classes. SSMODEL can be combined 
by horizontal or block diagonal concatenation. The former is the basis for additive model combination, where the 
observation is the sum of individual signals. The latter is an independent combination in which the observation is the 
vertical concatenation of the individual signals, which obviously needs to be further combined or changed to be useful. 
Only SSMODEL objects are used directly in data analysis functions, knowledge of embedded objects are needed only 
when defining custom models. For a list of predefined models see appendix [A] 

B.5.1 Subscripted reference and assignment 



name 


RA 


Model name 


A string. 


Hinfo 


R 


Observation disturbance info 


A structure. 


info 


R 


Model component info 


A cell array of structures. 


P 


R 


Model observation dimension 


A scalar 


m 


R 


Model state dimension 


A scalar 


r 


R 


Model state disturbance dimension 


A scalar 


n 


R 


Model time duration 


A scalar. Value is 1 for stationary models. 


H 


R 


Observation disturbance 


A SSMAT or SSDIST. 


Z 


R 


State to observation transform 


A SSMAT or SSFUNC. 


T 


R 


State update transform 


A SSMAT or SSFUNC. 


R 


R 


Disturbance to state transform 


A SSMAT. 


Q 


R 


State disturbance 


A SSMAT or SSDIST. 


c 


R 


State update constant 


A SSMAT. 


al 


RA 


Initial state mean 


A stationary SSMAT, can assign matrices. 


PI 


RA 


Initial state variance 


A stationary SSMAT, can assign matrices. 


sta 


R 


True for stationary SSMODEL 


A logical scalar. 


linear 


R 


True for hnear SSMODEL 


A logical scalar. 


gauss 


R 


True for Gaussian SSMODEL 


A logical scalar. 


w 


R 


Number of parameters 


A scalar 


paramname 


R 


Parameter names 


A cell array of strings. 


param 


RA 


Parameter values 


A 1 X w vector Assignment cannot change w. 


psi 


RA 


Transformed parameter values 


A 1 X w vector Assignment cannot change w. 


q 


R 


Number of initial diffuse elements 


A scalar 



B.5.2 Class Methods 

• ssmodel 

SSMODEL constructor 

SSMODELC', info, H, Z, T, R, Q) creates a constant state space model with complete diffuse ini- 
tialization and c = 0. SSMODEL ('' , info, H, Z, T, R, Q, A, func, grad, psi[, pmask, 
PI, al, c] ) create a state space model, info is a string or a free form structure with field 'type' describing 
the model component. H is a SSMAT or SSDIST. Z is a SSMAT or SSFUNC. T is a SSMAT or SSFUNC. R is a 
SSMAT. Q is a SSMAT or SSDIST. A is a cell array of strings or a single string, each corresponding to func, 
a cell array of functions or a single function. Each string of A is a concatenation of some of ' H ' , ' Z ' , ' T ' , 
' R' , ' Q' , ' c' , ' al' , ' PI' , ' Hd' , ' Zd', ' Td', ' Rd' , ' Qd' , ' cd' , ' Hng' , ' Qng' , ' Znl' , ' Tnl' , 
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representing each part of each element of SSMODEL. For example, ' Qng' specifies the non-Gaussian part of 
Q, and ' Zd' specifies the dynamic part of Z. A complicated example of a string in A is 

['H' repmat {'ng' , 1, gauss) repmat('T', 1, p+P>0) 'RQPl'], 

which is the adjacency string for a possibly non-Gaussian ARIMA components model, f unc is a cell array of 
functions that updates model matrices. If multiple parts of the model are updated the output for each part must 
be ordered as for the adjacency matrix, but parts not updated can be skipped, for example, a function that updates 

H, T and Q must output [Hvec Tvec Qvec] in this order, grad is the derivative of f unc, each of which can 
be [] if not diff ere ntiable. For the example function above the gradient function would output [Hvec Tvec 
Qvec Hgrad Tgrad Qgrad] in order, psi is a SSPARAM. pmask is a cell array of parameter masks for 
the corresponding functions, can be omitted or set to [] if there's only one update function (which presumably 
uses all parameters). PI and al are both stationary SSMAT. c is a SSMAT. All arguments expecting SSMAT 
can also take numeric matrices. 

• isgauss 

ISGAUSS (•) is true for Gaussian SSMODEL. 

• isllnear 

ISLINEAR(-) is true for linear SSMODEL. 

• issta 

ISSTA (•) is true for stationary SSMODEL. Note that all SSFUNC are implicitly assumed to be dynamic, but 
are treated as stationary by this and similar functions. 

• set 

SET(-, A, M, func, grad, psi, pmask) changes one specific element of SSMODEL. A is an adja- 
cency string, specifying which elements the function or functions func updates. A is a adjacency string as 
described earlier, corresponding to func. M should be the new value for one of H, Z, T, R, Q, c, al or P 1, and A 
is constrained to contain only references to that element, grad is the same type as func and is its gradient, set 
individual functions to if gradient does not exist or needed, psi is a SSPARAM that stores all the parameters 
used by functions in func, and pmask is the parameter mask for each function. 

• setparam 

SETPARAM(-, psi, transformed) sets the parameters for SSMODEL. If transformed is true, psi 
is the new value for the transformed parameters, else ps i is the new value for the imtransformed parameters. 

C Function Reference 

C.l User-defined update functions 

This section details the format of update functions for various parts of classes SSMAT, SSDIST and SSFUNC. 

• Stationary part 

vec = func (psi) updates the stationary part of SSMAT. vec must be a column vector such that mat (mmask) 
= vec would correctly update the stationary matrix mat given parameters psi. 
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• Dynamic part 

dsubvec = func(psi) updates the dynamic part of SSMAT. dsubvec must beamatrix such that dvec (dvmask, 
1 : size (dsubvec, 2 ) ) = dsubvec would correctly update the dynamic vector sequence dvec given 
parameters psi. 

• Non-Gaussian part 

distf = func(psi) updates the non-Gaussian part of SSD 1ST. distf must be a cell matrix of function 
handles such that [matf dmask] = distf:, land [logpf dtnask] = distf:, 2 would correctly 
update the distributions given parameters psi. 

• Nonlinear part 

funcf = func (psi) updates the nonlinear part of SSFUNC. funcf must be a cell matrix of function 
handles such that [ffmask] = funcf:, 1 and [dff mask] = funcf:, 2 would correctly update the 
nonUnear functions given parameters psi. 

Note that an update function can return more than one of the four kind of output, thus updating multiple part of a 
SSMODEL, but the order of the output arguments must follow the convention: stationary part of H, Z, T, R, Q, c, al, 
PI, dynamic part of H, Z, T, R, Q, c, non-Gaussian part of H, Q, nonlinear part of z, T, with possible omissions. 

C.2 Data analysis functions 

Most functions in this section accepts analysis settings options, specified as option name and option value pairs (e.g. 
( ' disp' , ' off ) ). These groups of arguments are specified at the end of each fimction that accepts them, and 
are represented by opt in this section. 

• batchsmo 

[alphahat epshat etahat] = BATCHSMO(y, model [, opt] ) performs batch smoothing of mul- 
tiple data sets, y is the data of dimension p x n x N, where n is the data length and N is the number of data 
sets, there must be no missing values, model is a SSMODEL. The output is respectively the smoothed state, 
smoothed observation disturbance and smoothed state disturbance, each of dimensions mxnxN, pxnxN 
and r X n X N. This is equivalent to doing f astsmo on each data set. 

• dlsturbsmo 

[epsliat etahat epsvarhat etavarhat] = DISTURBSMO (y, model [, opt ]) performs dis- 
turbance smoothing, y is the data of dimension p x n, and model is a SSMODEL. The output is respectively 
the smoothed observation disturbance (p x n), smoothed state disturbance (r x n), smoothed observation dis- 
turbance variance (p x p x n or 1 x n if p = 1) and smoothed state disturbance variance (rxrxnorlxn 
if r = 1). 

• estimate 

[model logL output] = ESTIMATE (y, model [, paramO, alphaO, opt] ) estimates the pa- 
rameters of model starting from the initial parameter value paramO. y is the data of dimension p x n, and 
model is a SSMODEL. paramO can be empty if the current parameter values of model is used as initial value, 
and a scalar paramO sets all parameters to the same value. Alternatively paramO can be a logical row vec- 
tor specifying which parameters to estimate, or a 2 x w matrix with the first row as initial value and second 
row as estimated parameter mask. The initial state sequence estimate alphaO is needed only when model is 
non-Gaussian or nonlinear output is a structure that contains optimization routine information, approximated 
observation sequence y if non-Gaussian or nonlinear, and the AlC and BIC of the output model. 
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• fastsmo 

[alphahat epshat etahat] = fastsmo(y, model [, opt]) performs fast smoothing, y is the 
data of dimension p x n, and model is a SSMODEL. The output is respectively the smoothed state (m x n), 
smoothed observation disturbance (p x n), and smoothed state disturbance (r x n). 

• gauss 

[model ytilde] = GAUSS (y, model [, alphaO, opt]) calculates the Gaussian approximation, 
y is the data of dimension p x n, and model is a SSMODEL. alphaO is the initial state sequence estimate and 
can be empty or omitted. 

• kalman 

[a P V F] = KALMAN (y, model [, opt ]) performs Kalman filtering, y is the data of dimension p x 
n, and model is a SSMODEL. The output is respectively the filtered state (m x n+1), filtered state variance 
(m x m x n+1, or 1 x n+1 if m = 1), one-step prediction error (innovation) (p x n), one-step prediction variance 
(p X p X n, or 1 X n if p = 1). 

• linear 

[model ytilde] = LINEAR (y, model [, alpliaO, opt]) calculates the Unear approximation, y 
is the data of dimension p x n, and model is a SSMODEL. alphaO is the initial state sequence estimate and 
can be empty or omitted. 

• loglik 

LOGLIK(y, model [, ytilde, opt]) returns the log likelihood of model given y. y is the data of 
dimension p x n, and model is a SSMODEL. ytilde is the approximating observation y and is needed for 
some non-Gaussian or nonlinear models. 

• sample 

[y alpha eps eta] = SAMPLE (model, n[, N] ) generates observation samples frommodel. model 
is a SSMODEL, n specifies the sampling data length, and N specifies how many sets of data to generate, y is 
the sampled data of dimension p x n x N, alpha, eps, eta are respectively the corresponding sampled state 
(m X n X N), observation disturbance (p x n x N), and state disturbance (r x n x N). 

• signal 

SIGNAL (alpha, model) generates the signal for each component according to the state sequence and 
model specification, alpha is the state of dimension m x n, and model is a SSMODEL. The output is a cell 
array of data each with dimension p x n, or a M x n matrix where M is the number of components if p = 1. 

• simsmo 

[alphatilde epstilde etatilde] = SIMSMO(y, model, N[, antithetic, opt]) gen- 
erates observation samples from model conditional on data y. y is the data of dimension p x n, and model 
is a SSMODEL. antithetic should be set to I if antithetic variables are used. The output is respectively the 
sampled state sequence (m x n x N), sampled observation disturbance (p x n x N), and sampled state disturbance 
(r X n X N). 

• statesmo 

[alphahat V] = STATESMO (y, model [, opt]) performs state smoothing, y is the data of dimen- 
sion p X n, and model is a SSMODEL. The output is respectively the smoothed state (m x n), smoothed state 
variance (m x m x n, or 1 x n if p = 1). If only the first output argument is specified, fast state smoothing is 
automatically performed instead. 
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• arlmaselect 

[p d q mean] = ARIMASELECT (y ) or 

[p d q P D Q mean] = ARIMASELECT (y, s) performs TRAMO model selection, y is the data of 
dimension p x n, and s is the seasonal period. The first form selects among ARIMA models with or without a 
mean. The second form selects among SARllMA models with or without a mean. 

• armadegree 

[p q] = ARMADEGREE ( y [ , mean, mr] ) or 

[p q P Q] = ARMADEGREE (y, s[, mean, mr, ms ]) determines the degrees of ARMA or SARMA 
from observation y. y is the data of dimension p x n, and s is the seasonal period, mean is true if models with 
mean are used for the selection, mr and ms is the largest degree to consider for regular ARMA and seasonal 
ARMA respectively. 

• dlff degree 

[d mean] = DIFFDEGREE (y [ , ub, tsig] ) or 

[d D mean] = DIFFDEGREE (y, s[, ub, tsig]) performs unit root detection. The first form only 
considers regular differencing, y is the data of dimension p x n, s is the seasonal period, ub is the threshold 
for unit roots, default is [0.97 0.88], and t s i g is the t- value threshold for mean detection, default is 1 . 5 . 

• htd 

[theta ksivar] = HTD (d, D, s. Phi, Theta[, etavar] ) performs HiUmer-Tiao decompo- 
sition, d is the order of regular differencing, D is the order of seasonal differencing, s is the seasonal period, 
Phi is a cell array of increasing order polynomials representing autoregressive factors for each component, 
Theta is an increasing order polynomial representing the moving average factor, etavar is the disturbance 
variance. The output theta is a cell array of decomposed moving average factors for each component, and 
ksivar is the corresponding disturbance variance for each component (including the irregular variance at the 
end). 

• ssmhtd 

SSMHTD (model ) performs Hillmer-Tiao decomposition on SSMODEL model and returns an ARIMA com- 
ponent model. This is a wrapper for the function htd. 

• loglevel 

LOGLEVEL (y, s) or 
LOGLEVEL (y, p, q) or 

LOGLEVEL (y, p, d, q[, P, D, Q, s]) determines the log level specification with respect to various 
models, y is the observation data, s is the seasonal period. The output is 1 if no logs are needed, else 0. The 
first form uses the airUne model with specified seasonal period, second form uses ARMA models, and the third 
form uses SARIMA models. 

• oosforecast 

[yf err SS] = OOSFORECAST (y, model, nl, h) performs out-of-sample forecast, y is the data 
of dimension p x n, model is a SSMODEL, nl is the number of time points to exclude at the end, and h is the 
number of steps ahead to forecast, which can be an array. The output yf is the forecast obtained, err is the 
forecast error, and SS is the forecast error cumulative sum of squares. 
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C.3 Stock element functions 

• fun arma 

[ fun grad ps i ] = FUN JVRMA (p , q) creates matrix update functions for ARMA models. 

• fun_cycle 

[ fun grad ps i ] = FUN_CYCLE ( ) creates matrix update functions for cycle components. 

• fun_dupvar 

[fun grad psi] = FUN_DUPVAR (p, cov, d[, name]) creates matrix update functions for dupli- 
cated variances. 

• fun genair 

[fun grad psi] = FUN_GENAIR (nparam, nfreq, freq) creates matrix update functions for gen- 
eralized airline models. 

• fun_homovar 

[fun grad psi] = FUN_HOMOVAR (p, cov, q[, name]) creates matrix update functions for ho- 
mogeneous variances. 

• fun_interlvar 

[fun grad psi] = FUN_INTERLVAR (p, q, cov[, name]) creates matrix update functions for q- 
interleaved variances. 

• funjnvcycle 

[fun grad psi] = FUN jyiVCYCLE (p) creates matrix update functions for multivariate cycle component. 

• fun_oneoverf 

[fun grad psi] = FUN_ONEOVERF (m) creates matrix update functions for 1/f noise. 

• fun_sarimahtd 

[fun grad psi] = FUN_SARIMAHTD (p, d, q, P, D, Q, s, gauss) creates matrix update func- 
tions for SARIMA with HiUmer-Tiao decomposition and non-Gaussian irregular. 

• fun s arma 

[fun grad psi] = FUN_SARMA(p, q, P, Q, s ) creates matrix update functions for seasonal ARMA 
models. 

• fun spline 

[fun grad psi] = FUN_SPLINE (delta) creates matrix update functions for the cubic spUne models. 

• f un var 

[fun grad psi] = FUN_VAR(p, cov[, name]) creates matrix update functions for variances. 

• fun wvar 

[fun grad psi] = FUN_WVAR{p, s[, name]) creates matrix update functions for W structure vari- 
ances. 



• mat-arima 

[Z T R PI Tmmaslc Rmmask PlmmasJc] = MAT_ARIMA(p, d, q, mean) creates base matrices for 
ARIMA models. 
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• mat-arma 

[Z T R PI Tmmask Rmmask Plmmask] = MAT_A.RMA(p, q, mean) or 

[T R PI Tmmask Rmmask Plmmask] = MAT_A.RMA(p, q, mean) creates base matrices for ARMA 
models. 

• mat-commonlvls 

[Z T R al PI] = MAT_COMMONLVLS (p. A, a ) creates base matrices for common levels models. 

• mat-dvunmy 

[Z T R] = MAT_DUMMY ( s [ , fixed]) creates base matrices for dummy seasonal component. 

• mat dupvar 

[m mmask] = MAT.DUPVAR (p, cov, d) creates base matrices for duplicated variances. 

• mat homovar 

[H Q Hmmask Qmmask] = MAT_HOMOVAR (p, cov, q) creates base matrices for homogeneous vari- 
ances. 

• mat hs 

[Z T R] = MAT_HS(s) creates base matrices for Harrison and Stevens seasonal component. 

• mat_interlvar 

[m mmask] = MAT_INTERLVAR (p, q, cov) creates base matrices for q-interleaved variances. 

• mat.lpt 

[Z T R] = MAT_LPT (d, stochastic) creates base matrices for local polynomial trend or integrated 
random walk models. 

• matjnvcycle 

[Z T Tmmask R] = MATjyiVCYCLE (p) creates base matrices for multivariate cycle components. 

• mat_mvdvmimy 

[Z T R] = MATjyiVDUMMY (p, s[, fixed]) creates base matrices for multivariate dummy seasonal 
components. 

• mat_mvhs 

[Z T R] = MATjyiVHS (p, s) creates base matrices for multivariate Harrison and Stevens seasonal com- 
ponents. 

• matjnvintv 

[Z Zdmmask Zdvec T R] = MATjyiVINTV (p, n, type, t au ) creates base matrices for multivari- 
ate intervention components. 

• matjnvllt 

[Z T R] = MAT jyiVLLT (p) creates base matrices for multivariate local level trend models. 

• mat_mvreg 

[Z Zdmmask Zdvec T R] = MATjyiVREG (p, x, dep) creates base matrices for multivariate regres- 
sion components. 
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• mat-mvtrig 

[Z T R] = MATjyiVTRIG (p, s[, fixed]) creates base matrices for multivariate trigonometric sea- 
sonal components. 

• mat_reg 

[Z Zdminask Zdvec T R] = MAT_REG(x[, dyn] ) creates base matrices for regression components. 

• mat sarima 

[Z T R PI Tmmask Rmmask Plmmask] = MAT.SARIMA (p, d, q, P, D, Q, s, mean) or 
[Z T R PI Rmmask Plmmask] = MAT_SARIMA (p, d, q, P, D, Q, s, mean) creates base ma- 
trices for SARIMA models. 

• mat-sarimahtd 

[Z T R Qmat PI Tmmask Rmmask Qmmask Plmmask] = MAT.SARIMAHTD (p, d, q, P, D, 
Q , s ) creates base matrices for SARIMA with Hillmer-Tiao decomposition. 

• mat-spline 

[Z T Tdmmask Tdvec R] = MAT_ (delta) creates base matrices for the cubic spline model. 

• mat sumarma 

[Z T R PI Tmmask Rmmask Plmmask] = MAT.SUMARMA (p, q, D, s, mean) creates base ma- 
trices for sum integrated ARMA models. 

• mat trig 

[Z T R] = MAT_TRIG(s[, fixed] ) creates base matrices for trigonometric seasonal components. 

• mat var 

[m mmask] = MAT_VAR(p, cov) creates base matrices for variances. 

• mat_wvar 

[m mmask] = MAT_WVAR(p, s) creates base matrices for W structure variances. 

• ngdist_binomial 

[matf logpf] = NGDIST_BINOMIAL (k) creates base functions for binomial distributions. 

• ngdist expfamily 

[matf logpf] = NGDIST_EXPFAMILY (b, d2b, id2bdb, c) creates base functions for exponen- 
tial family distributions. 

• ngdist-multlnomial 

[matf logpf] = NGDISTjyiULTINOMIAL (h, k) creates base functions for multinomial distributions. 

• ngdist negbinomial 

[matf logpf] = NGDIST_NEGBINOMIAL (k) creates basefimctionsfornegativebinomialdistributions. 

• ngfun.t 

[fun psi] = NGFUN_T ( [nu] ) creates update functions for the t-distribution. 

• psi2err 

distf = PSI2ERR(X) is the update function for the general error distribution. 
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• psl2inix 

distf = PS 1 2MI X (X) is the update function for the Gaussian mixture distribution. 

• psl2zinsv 

distf = PSI2ZMSV (X) is the update function for the zero-mean stochastic volatility model disturbance 
distribution. 

• x_ee 

X_EE(Y, M, d) generates Easter effect variable. 

• x_intv 

X_INTV{n, type, tau) generates intervention variables . 

• x_ly 

X_LY(Y, M) generates leap-year variable. 

• x_td 

X_TD (Y, M, td6) generates trading day variables. 
C.4 Miscellaneous helper functions 

• diaginprod 

Y = D lAGINPROD (A, xl [ , x2 ] ) calculates the irmer product of a vector sequence. A is a m x m square 
matrix, xl and x2 are m x n matrices representing a set of n colunm vectors each of size m x 1. The {i, j)th 
element of y is equal to (xl (: , j)— x2(:, i))'^A(xl(:, j)— x2(:, i)). 

• f_alpha2arma 

[xAR xMA] = FJ^LPHA2ARMA (n, m, alpha, sigma2) generates 1/f noise by AR and MA approx- 
imations, n is length of series to generate, m is number of terms to use in the approximation, alpha is the 
exponent of inverse frequency of the 1/f noise, and sigma2 the variance of the 1/f noise. The output is the 1/f 
noise generated by AR and MA approximation respectively. 

• meancov 

[m C] = MEANCOV (x) calculates the mean and covariance of each vector set in a sequence, xisam xnx A'' 
matrix representing n vector sets each of size N containing to x 1 vectors, m is a to x n matrix, the mean of 
each of the n vector sets. CisamXTOXn matrix, the covariance of each of the n vector sets. 

• randarma 

RAND ARMA ( n , r ) generates random ARMA polynomials, n is the polynomial degree and r is root magnitude 
range. 

• setopt 

SETOPT (optnamel, optvaluel, optname2, optvalue2, ...) sets the global analysis settings 
and returns a structure containing the new settings, thus a call with no arguments just returns the current settings. 
The available options are: 'disp' can be set to ' of f , 'notify', 'final' or 'iter', tol is the 
tolerance, fmin is the minimization algorithm to use (' bf gs ' or ' simplex' ). maxiter is the maximum 
number of iterations, n s amp is the number of samples to use in simulation. Note that the same form of argument 
sequence can be used at the end of most data analysis functions to specify one time overrides. 



46 



Peng & Aston 



• ymarray 

[Y M] = YMARRAY (y, m, N ) generates an array of years and months from specified Starting year y, start- 
ing month m, and number of months N. 



